tv American Artifacts CSPAN November 17, 2013 7:00pm-7:31pm EST
we travel to learn about the project in the library. the collections exist in digital form for scholars, people with disacts and general public. we begin by hearing from brewster kel, founder of the ar arrive. >> the idea is build the library version two. the idea is can you go and make all book, music video available to anybody that's curious enough. we're trying to do pieces that are missing. maybe large scale materials or materials people are putting on their websites everyday and coming down to give that a permanent home. physical materials, and offering
them for free public access anywhere in the world. the digital materials are on a church we've brought in san francisco. i love looking at them. every time the light blinks someone is uploading or downloading from the archive. we get 3 million viewers a day. books aloan we get 10 to 15 million books a month download ed from the internet archive. we have the primary servers here. we have backup servers in richmond, california and in redwood city. we have a partial copy in egypt
and amsterdam. the idea is to have copies in other places so that we don't have the fate of burning and disappearing. it has been my idea to try to take advantage of this change that we can actually store things online and make them broadly available. it was always in the air. the promise of having the library of congress on your desk forever. it was like why don't we just go and build that? it has been a long journey and then building some of the computers and systems that became the worldwide web. the early days trying to get this open structure to work and building the 1996 archive.
by archiving the worldwide web, television, music, movies. and now, we digitizing 1,000 books a day and making them publicly available. my name is jesse bell. i'm the coordinator here at the san francisco scanning center for internet archive. we have 33 scanning locations and this is one of them in which we were working on digitizing books, microfilm and also regular film. we have three main steps and we have our first to create a web page for each item. we have our second step which is the image capture step. tina is working on our second step. this is the machine that we designed. it is a way to digitize books that is non destructive that
also ju just as important provides high quality images once the book is online. we have two professional grade cameras one in the upper right and upper left. what tina will do is go through each page and take an image. we have two panes of glass on top and what that does is, it presses the page so that it is flat and that will provide a nice clear, image once it is online. it is a versatile set up we can scan all types of books and we want to be prepared for that. after the image capture step we have the third and final step. we call this the republishing
step we want to ensure that every page is present and make sure the images are clear and the content has been captured. what she is doing now is performing the quality check and also setting the book up now to be used online. there's a crop box on each page so all of the content inside that crop box will display online. and once karina verifies that the quality is correct and the presentation is correct she will upload the book. it takes 24 to 48 hours for the book to appear on our website. once you look at the item on our website, there's a couple different ways to look. we have the inbrowser reader to browse through, search the text and view in a number of ways
depending on your preference. there's a number of file formats to download as well. a pdf, a kindle file and an e file if you have the ebook reader. we can scan 800 to 1,000 pages an hour. combined with all of the steps it takes roughly an hour. this is per book. we have over three million books online. when we digitize books we put the images online and put it in lots of different formats. we can make it in pdf, the kindle the nook and we have to keep converting these materials to new formats. the one i'm excited about is the talking book that is made available to the blind of dyslexic. even in copy right books are
available to them if they go to congress and get qualification to be blind enough to have copy right to materials or dislexic enough. or just elderly people do this. we have digitized everything from harry potter and the whole she bang. those are available for free. there are 500,000 books that are available for those. if they have to buy a commercial device that will speak to them, it talks a little bit like this. but it is a format that we can move these books to. investigation before the president's commission on the assassination of president kennedy pursuant to executive order 11130. executive order correcting commission reporting upon the facts relating to the assassination of the president and the death of the man charged with the assassination.
so there's now an open library.org, the website to go and find these books. not only books for blind and dislexic but also books to borrow. there are 250,000 books only one at a time. we have 250 million of those available. it's very popular. we often don't have the newest books. we've got something on every subject. actually the whole internet archive is a bargain. it is $12 million a year. there are 150 people that work for the archive but the reason why it can be that inexpensive is we are doing a tiny part of
the work. there are volunteers all over the world that are building the collections. we are hosting them. it is like going and saying that the publisher is the author. the library isn't the author either. the real value is the cultural materials are being built. >> half of the money from the archive comes from the money to pay for books or collect web pages. we have room in the library of congress itself where we are digitizing books all day long. half of the money comes from foundations or private individuals. the way back machine is a way to get access to the web holdings. we started collecting the
worldwide web in 1996 by having a computer contact every website and click on every link on it and download the pages and images to be able to reproduce that website after the fact. we do this for all of them every two months. so basically take a snapshot, every two months and then again snapshot, snapshot, snapshot. it is starting to get big. we collect a billion pages each week. the total number of pages in the collection is 283 billion. that is just the web data. mega, gig a beta.
that is made available to people sometimes in bulk but most people go to the way back machine which is named after the rockie and bull winkle show. and it makes it so that you can type in the url and see the web as it was. out of print webpages. it gets 600,000 people a day using it. it gets a database of 283 billion now. it gets 1,000 to 2,000 a second. it is much more popular than i thought it would be. it is serving a roll of making the web reliable. so in this digital world we have to be proactive to go and collect the materials. the way the whole thing way back machine and archiving started for us was we started working with the smithsonian institution.
they were interested in having the websites for that whole campaign archived and put a display in the presidential memorabilia room. he said maybe it is going to be like the bumper sticker. we want that bumper sticker. that was the first collection that we did. we worked closely with the library of congress. our next one was working to archive the year 2000 websites. yes, we worked closely with the library of congress. they have been great partners. we give them back the data and they store it forever. they have been instrumental in making it so that the web is a living document through the internet archive and we then help them by building their collections. they work with the national archives every two years to
archive the end of term government webites. and that is available for the nara.gov website. but it is when you go and use that, it is using our servers. they keep an off line copy. but we are actually the ones that host those materials. the different industries are set up different ways to protect their exist ing interests. we are finding that every field needs to be addressed differently. we do that. if people want to be taken out of it, we do that. it's basically an opt out type system. we lend books that are more modern. the television collection which is a fantastic new resource you get to search based on closed
captions on the united states news programs the last three years for free. you only get 30 seconds back. if you want more, then you borrow the whole program then we put it on a dvd and you send it back. each one of these has evolved a different method. if we deal with them respectively and as a library, non profit, it seems to all around work. we have over a million different moving images to have available to download and remix. the archive is fantastic. there is lots of things that people have uploaded. there is one called fedflix which is a collection that comes from the u.s. government. carl is a person who's gotten
the publicresource.org has gotten the old types from the archives and made them available for free public reuse. he has an obsession with bringing public access to public materials so that all the outputs of the united states government are public domain. the idea of the internet is you should be able to then get to it. >> our government is t care taker for these stores lie fallow today but they could become a platform that provides access to all. prior efforts have been half hearted. we should be spending minimum of $250 million a year a decade for a scanning objective. the smithsonian, the national archives, the library of congress and national library of
medicine must work together to develop a strategy compelling enough to make congress the foundations and the public all clamor to help them create this platform. the public domain is still often difficult to get to. it is like building a national park but then making it hard for people to get to it. makes it hard for people to get to it. carl has taken the multidecade project to go and get materials. sometimes he has to buy them to get access to them and make them and make them publicly available. he is one of my heros. it used to cost quite a bit to get
co copys of these. you see what is there and done download that and do whatever you want. >> i'm rick pralinger i'm founder of the archives of film located in san francisco. i'm also a board member of prelinger archive. both of the archives collections have large collections of federal government produced films. and also some films made by states and the local governments as well. the federal government is historically is the world's biggest media maker. and their films run the gamut from industrial training films about aircraft riveting to policy releated films that might be a recording of a speech to explanation of a particular program. two great works of art like the plow that broke the planes. films that a lot of people know. days of lightning, day of drum, life of jfk. but there is a lot of every day government films that are not so
well-known. one that i like a lot is "tuesday in november". this was made by john hossman and directedly nicholas ray. the score is by virgil thompson. this is a quite interesting film. it was made in 1945 by the office of information to be seen overseas and it's a simple language in declarative cities. it's an american city. it's name is riverton. it is not too big nor too small. it is early morning of the first tuesday in november. this is an american city. a city that is not very large, not very rich, not very old. >> and it is made to be translated into many languages. the idea of the film is that we are such a strong democratic country that we can hold a presidential election in the
middle of a war, 1944 and the country will survive. and it dramatizes how the people are invested. it is certainly a film that people should see today. i think we tend to think that it has always been this way. people that feel disinfranchised or an anti-government, whatever their rationale is for those feelings, i think it is interesting for them to look at a period when people were more engaged and try to understand why. the federal government films, i like the ones with the smoke critters. we all know about smokey bear and mr. zip. but the adventures of junior raindrop about erosion.
>> out tumble millions of earth bound rain drops. >> ouch. fine welcome this is. >> you know, little run away rain drops love forests where too many trees have been cut down. >> people always love the military etiquette films. >> in addition to cleanliness good grooming means good taste. one note can ruin the total impression. to be well groomed in the back your hair should neither be too long nor too short but an appropriate length in the cap. it should not extend below the bottom edge of the collar. this obviously won't do and neither will this. you are in the woman's army not
the men's and should always strive to look feminine. >> the american nuclear testing program was heavily, heavily documented and there was the scientific photography which was unedited photography and quite of ten those were used in other films. and those were oriented to promote a particular program. i think the government films that have meant the most to a lot of people are the nasa films. so much of our sense of space and exploration and pioneering has been created by footage that was shot by employees in space. nasa has made wonderful films over the years. tower clear.
we have examples of this in the past. we have a public library system that has done a tremendous job at being free to all and we need to make sure that in the digital age that this continues. so while this involves new technology, it is traditional . how can we keep the threads of knowledge going and make sure they are accessible to the public? >> we hope this becomes a central library but a model of other libraries to digitize their holdings. wouldn't it be great to have everything at harvard and princeton and yale online? and if you have enough interest that you can get ahold of it without having to go and be admitted to one of these great libraries and institutions that is what we are looking for. the wiki pedia generation to
make new and different access to it chlt but to have computers have access to it so you can have new patterns whether word use over time or interesting work that's gone on and understanding corporate funding and how does that influence research. that type of thing you need to be able to do researching. that kind of computer engagement with these materials is now possible. we are trying to make it so that people can dive in. whether onesies or twosie, or whether they are finding new patterns or finding the next generation of google. i'm a geek. i went to mit and studied with artificial intelligence. we thought about building a global brain and we thought at least it better that read all of
the good books. >> if we start with books, where are we? >> first you have to scope the problem. how big is it? if you want to put all of the published works online how big a problem is it? we don't know. the largest print library in the world is the library of congress. 26 million volumes. it's by far and large the biggest in the world. it's about a megabyte. so 26 million megabytes is 26 terabytes. mega, giga, tera. it's about this big. and it fits in a system about this big and it costs about $60,000 so for the cost of a house or around here a garage you can have spinning all of the words of congress.
>> we can continue to work on this for decades and it will never be a boring, dull day. the internet is a fun vibrant environment to be a part of. the internet archive is about 150 people. 100 people work in the scanning centers scanning in eight countries scanning books mostly. and there are 40 of us that work here in san francisco that are programmer, administrators, librarians, and we are the ones bringing the collections together. >> we would like to say that we bought this building because it matches the logo. we have columns on our logo but this really was a building that was available. san francisco is overcrowded and
in san francisco it's a hard to find spaces. this space seemed perfect and we were able to come in and be under one roof. this was the place of worship for the christian scientists. these are servers. every time you see a light blinking that is data coming in or out. these are our ter ra cotta soldiers. when someone has worked with the archives for three years, bruce ter commissions a statue. there is one of these replica of anyone that's been with the organization for three years. i'm in the second row on the left from where you are there. really the power of the archive is the internet and number of people out there there that are doing wonderful projects.
they are often putting things on commercial hosts and archiving those through and putting them out through the wayback machine. we are a small hub but the real work going on figuring out how it should be archived or presented is going on by thousand s and thousands of enthusiasts . they make sure that they want their grand fathers works still around and they want these particular passions. how can we go and take the work of these people and make sure that it endures? endures for sen which you centu. >> you can explore the digital materials at archive.org. links on the main page take you to the collections.
by searching key words you can find key films and publications created by the united states government. ♪ ♪ >> our story begins with a young lady named margaret oliver. you ought to meet her because she is working for four out of every five americans. you see she is your representative at the social security administration. her job is to tell you what your rights are.and politics and programming. you can join the conversation on
social media sites. this november 22nd, marks the 50th anniversary of president kennedy's assassination. we will look back on the president's policies. he stood before the united nation's general assembly for what would be his last address to that body. on september 20th, 1963 he noted that the delegates with meeting in an atmosphere of rising hope. he spoke of nuclear arms, and the space race evolving into joint exploration with the soviet union. this 18 minute speech is shown curtesy of the visual library. mr. president, as one who has taken some interest in the